Overview

Dataset statistics

Number of variables16
Number of observations18249
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.6 MiB
Average record size in memory93.3 B

Variable types

NUM10
CAT5
DATE1

Warnings

region has a high cardinality: 54 distinct values High cardinality
4046 is highly correlated with total volume and 3 other fieldsHigh correlation
total volume is highly correlated with 4046 and 3 other fieldsHigh correlation
4225 is highly correlated with total volume and 3 other fieldsHigh correlation
total bags is highly correlated with total volume and 4 other fieldsHigh correlation
small bags is highly correlated with total volume and 4 other fieldsHigh correlation
large bags is highly correlated with total bags and 1 other fieldsHigh correlation
region is uniformly distributed Uniform
df_index has 432 (2.4%) zeros Zeros
4046 has 242 (1.3%) zeros Zeros
4770 has 5498 (30.1%) zeros Zeros
large bags has 2371 (13.0%) zeros Zeros
xlarge bags has 12048 (66.0%) zeros Zeros

Reproduction

Analysis started2020-09-14 16:09:13.672094
Analysis finished2020-09-14 16:09:36.326305
Duration22.65 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

ZEROS

Distinct53
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24.2322319
Minimum0
Maximum52
Zeros432
Zeros (%)2.4%
Memory size142.6 KiB
2020-09-14T21:39:36.496550image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q110
median24
Q338
95-th percentile49
Maximum52
Range52
Interquartile range (IQR)28

Descriptive statistics

Standard deviation15.48104475
Coefficient of variation (CV)0.6388616953
Kurtosis-1.254364272
Mean24.2322319
Median Absolute Deviation (MAD)14
Skewness0.1083337271
Sum442214
Variance239.6627467
MonotocityNot monotonic
2020-09-14T21:39:36.720789image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
74322.4%
 
114322.4%
 
14322.4%
 
24322.4%
 
34322.4%
 
44322.4%
 
54322.4%
 
64322.4%
 
84322.4%
 
94322.4%
 
Other values (43)1392976.3%
 
ValueCountFrequency (%) 
04322.4%
 
14322.4%
 
24322.4%
 
34322.4%
 
44322.4%
 
ValueCountFrequency (%) 
521070.6%
 
513221.8%
 
503241.8%
 
493241.8%
 
483241.8%
 

date
Date

Distinct169
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size142.6 KiB
Minimum2015-01-04 00:00:00
Maximum2018-03-25 00:00:00
2020-09-14T21:39:36.951308image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:37.181205image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

averageprice
Real number (ℝ≥0)

Distinct259
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.40597841
Minimum0.44
Maximum3.25
Zeros0
Zeros (%)0.0%
Memory size142.6 KiB
2020-09-14T21:39:37.395273image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0.44
5-th percentile0.83
Q11.1
median1.37
Q31.66
95-th percentile2.11
Maximum3.25
Range2.81
Interquartile range (IQR)0.56

Descriptive statistics

Standard deviation0.4026765555
Coefficient of variation (CV)0.2864030861
Kurtosis0.3251958507
Mean1.40597841
Median Absolute Deviation (MAD)0.28
Skewness0.5803027379
Sum25657.7
Variance0.1621484083
MonotocityNot monotonic
2020-09-14T21:39:37.591706image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1.152021.1%
 
1.181991.1%
 
1.081941.1%
 
1.261931.1%
 
1.131921.1%
 
0.981891.0%
 
1.191881.0%
 
1.361871.0%
 
1.591861.0%
 
0.991851.0%
 
Other values (249)1633489.5%
 
ValueCountFrequency (%) 
0.441< 0.1%
 
0.461< 0.1%
 
0.481< 0.1%
 
0.492< 0.1%
 
0.515< 0.1%
 
ValueCountFrequency (%) 
3.251< 0.1%
 
3.171< 0.1%
 
3.121< 0.1%
 
3.051< 0.1%
 
3.041< 0.1%
 

total volume
Real number (ℝ≥0)

HIGH CORRELATION

Distinct17137
Distinct (%)93.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean850643.523
Minimum84
Maximum62505646
Zeros0
Zeros (%)0.0%
Memory size142.6 KiB
2020-09-14T21:39:38.029752image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum84
5-th percentile2371.8
Q110838
median107376
Q3432962
95-th percentile3716314.8
Maximum62505646
Range62505562
Interquartile range (IQR)422124

Descriptive statistics

Standard deviation3453545.36
Coefficient of variation (CV)4.059920832
Kurtosis92.10445761
Mean850643.523
Median Absolute Deviation (MAD)102962
Skewness9.007687467
Sum1.552339365e+10
Variance1.192697555e+13
MonotocityNot monotonic
2020-09-14T21:39:38.241246image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
56765< 0.1%
 
12005< 0.1%
 
24785< 0.1%
 
16434< 0.1%
 
74224< 0.1%
 
49844< 0.1%
 
38854< 0.1%
 
86764< 0.1%
 
36144< 0.1%
 
28074< 0.1%
 
Other values (17127)1820699.8%
 
ValueCountFrequency (%) 
841< 0.1%
 
3791< 0.1%
 
3851< 0.1%
 
4191< 0.1%
 
4721< 0.1%
 
ValueCountFrequency (%) 
625056461< 0.1%
 
610344571< 0.1%
 
522886971< 0.1%
 
472939211< 0.1%
 
463245291< 0.1%
 

4046
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct12877
Distinct (%)70.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean293007.9443
Minimum0
Maximum22743616
Zeros242
Zeros (%)1.3%
Memory size142.6 KiB
2020-09-14T21:39:38.471174image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile19
Q1854
median8645
Q3111020
95-th percentile1263359.6
Maximum22743616
Range22743616
Interquartile range (IQR)110166

Descriptive statistics

Standard deviation1264989.081
Coefficient of variation (CV)4.31725182
Kurtosis86.80911253
Mean293007.9443
Median Absolute Deviation (MAD)8617
Skewness8.648219758
Sum5347101975
Variance1.600197375e+12
MonotocityNot monotonic
2020-09-14T21:39:38.699372image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
02421.3%
 
1670.4%
 
2530.3%
 
6520.3%
 
7470.3%
 
3460.3%
 
4430.2%
 
9360.2%
 
5360.2%
 
8350.2%
 
Other values (12867)1759296.4%
 
ValueCountFrequency (%) 
02421.3%
 
1670.4%
 
2530.3%
 
3460.3%
 
4430.2%
 
ValueCountFrequency (%) 
227436161< 0.1%
 
216201801< 0.1%
 
189330381< 0.1%
 
177876111< 0.1%
 
170766501< 0.1%
 

4225
Real number (ℝ≥0)

HIGH CORRELATION

Distinct14985
Distinct (%)82.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean295154.0807
Minimum0
Maximum20470572
Zeros61
Zeros (%)0.3%
Memory size142.6 KiB
2020-09-14T21:39:38.927375image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile103
Q13008
median29061
Q3150206
95-th percentile1303657.2
Maximum20470572
Range20470572
Interquartile range (IQR)147198

Descriptive statistics

Standard deviation1204120.403
Coefficient of variation (CV)4.079633254
Kurtosis91.94902186
Mean295154.0807
Median Absolute Deviation (MAD)28522
Skewness8.942465602
Sum5386266818
Variance1.449905944e+12
MonotocityNot monotonic
2020-09-14T21:39:39.145808image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0610.3%
 
5290.2%
 
16180.1%
 
11170.1%
 
2170.1%
 
8150.1%
 
41150.1%
 
3140.1%
 
44130.1%
 
65130.1%
 
Other values (14975)1803798.8%
 
ValueCountFrequency (%) 
0610.3%
 
1120.1%
 
2170.1%
 
3140.1%
 
42< 0.1%
 
ValueCountFrequency (%) 
204705721< 0.1%
 
204455011< 0.1%
 
203281611< 0.1%
 
189564791< 0.1%
 
178963911< 0.1%
 

4770
Real number (ℝ≥0)

ZEROS

Distinct7125
Distinct (%)39.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22839.39673
Minimum0
Maximum2546439
Zeros5498
Zeros (%)30.1%
Memory size142.6 KiB
2020-09-14T21:39:39.378968image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median184
Q36243
95-th percentile106156
Maximum2546439
Range2546439
Interquartile range (IQR)6243

Descriptive statistics

Standard deviation107464.0369
Coefficient of variation (CV)4.705204701
Kurtosis132.5635595
Mean22839.39673
Median Absolute Deviation (MAD)184
Skewness10.15940119
Sum416796151
Variance1.154851922e+10
MonotocityNot monotonic
2020-09-14T21:39:39.612745image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0549830.1%
 
11320.7%
 
31220.7%
 
21010.6%
 
4900.5%
 
6800.4%
 
8720.4%
 
9720.4%
 
7670.4%
 
12660.4%
 
Other values (7115)1194965.5%
 
ValueCountFrequency (%) 
0549830.1%
 
11320.7%
 
21010.6%
 
31220.7%
 
4900.5%
 
ValueCountFrequency (%) 
25464391< 0.1%
 
19936451< 0.1%
 
18961491< 0.1%
 
18802311< 0.1%
 
18110901< 0.1%
 

total bags
Real number (ℝ≥0)

HIGH CORRELATION

Distinct15963
Distinct (%)87.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean239638.7163
Minimum0
Maximum19373134
Zeros15
Zeros (%)0.1%
Memory size142.6 KiB
2020-09-14T21:39:39.854044image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile628.8
Q15088
median39743
Q3110783
95-th percentile1005478.2
Maximum19373134
Range19373134
Interquartile range (IQR)105695

Descriptive statistics

Standard deviation986242.3999
Coefficient of variation (CV)4.115538655
Kurtosis112.2721574
Mean239638.7163
Median Absolute Deviation (MAD)37300
Skewness9.756071704
Sum4373166933
Variance9.726740714e+11
MonotocityNot monotonic
2020-09-14T21:39:40.091261image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0150.1%
 
2668< 0.1%
 
9237< 0.1%
 
4137< 0.1%
 
9166< 0.1%
 
8806< 0.1%
 
3266< 0.1%
 
8416< 0.1%
 
8846< 0.1%
 
9906< 0.1%
 
Other values (15953)1817699.6%
 
ValueCountFrequency (%) 
0150.1%
 
34< 0.1%
 
64< 0.1%
 
72< 0.1%
 
91< 0.1%
 
ValueCountFrequency (%) 
193731341< 0.1%
 
163945241< 0.1%
 
162982961< 0.1%
 
159724921< 0.1%
 
158046961< 0.1%
 

small bags
Real number (ℝ≥0)

HIGH CORRELATION

Distinct14913
Distinct (%)81.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean182194.2177
Minimum0
Maximum13384586
Zeros159
Zeros (%)0.9%
Memory size142.6 KiB
2020-09-14T21:39:40.360865image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile256
Q12849
median26362
Q383337
95-th percentile768146.8
Maximum13384586
Range13384586
Interquartile range (IQR)80488

Descriptive statistics

Standard deviation746178.5104
Coefficient of variation (CV)4.095511482
Kurtosis107.0128857
Mean182194.2177
Median Absolute Deviation (MAD)25599
Skewness9.540660024
Sum3324862278
Variance5.567823694e+11
MonotocityNot monotonic
2020-09-14T21:39:40.642711image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
01590.9%
 
223140.1%
 
203140.1%
 
40130.1%
 
123110.1%
 
6110.1%
 
533100.1%
 
20100.1%
 
103100.1%
 
3100.1%
 
Other values (14903)1798798.6%
 
ValueCountFrequency (%) 
01590.9%
 
28< 0.1%
 
3100.1%
 
41< 0.1%
 
54< 0.1%
 
ValueCountFrequency (%) 
133845861< 0.1%
 
125671551< 0.1%
 
125403271< 0.1%
 
117128071< 0.1%
 
113928281< 0.1%
 

large bags
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct10486
Distinct (%)57.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean54337.66541
Minimum0
Maximum5719096
Zeros2371
Zeros (%)13.0%
Memory size142.6 KiB
2020-09-14T21:39:40.890263image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q1127
median2647
Q322029
95-th percentile195699.2
Maximum5719096
Range5719096
Interquartile range (IQR)21902

Descriptive statistics

Standard deviation243965.9461
Coefficient of variation (CV)4.489812808
Kurtosis117.9994984
Mean54337.66541
Median Absolute Deviation (MAD)2647
Skewness9.796455521
Sum991608056
Variance5.951938285e+10
MonotocityNot monotonic
2020-09-14T21:39:41.108873image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0237113.0%
 
32381.3%
 
61340.7%
 
4770.4%
 
10760.4%
 
5590.3%
 
8570.3%
 
13540.3%
 
2480.3%
 
26410.2%
 
Other values (10476)1509482.7%
 
ValueCountFrequency (%) 
0237113.0%
 
1180.1%
 
2480.3%
 
32381.3%
 
4770.4%
 
ValueCountFrequency (%) 
57190961< 0.1%
 
43242311< 0.1%
 
40813971< 0.1%
 
40234851< 0.1%
 
39881011< 0.1%
 

xlarge bags
Real number (ℝ≥0)

ZEROS

Distinct3577
Distinct (%)19.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3106.279029
Minimum0
Maximum551693
Zeros12048
Zeros (%)66.0%
Memory size142.6 KiB
2020-09-14T21:39:41.335455image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3132
95-th percentile12058.2
Maximum551693
Range551693
Interquartile range (IQR)132

Descriptive statistics

Standard deviation17692.83749
Coefficient of variation (CV)5.695830066
Kurtosis233.6046317
Mean3106.279029
Median Absolute Deviation (MAD)0
Skewness13.13982169
Sum56686486
Variance313036498.3
MonotocityNot monotonic
2020-09-14T21:39:41.530196image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
01204866.0%
 
3740.4%
 
2690.4%
 
6550.3%
 
1550.3%
 
7520.3%
 
5490.3%
 
8440.2%
 
4400.2%
 
15370.2%
 
Other values (3567)572631.4%
 
ValueCountFrequency (%) 
01204866.0%
 
1550.3%
 
2690.4%
 
3740.4%
 
4400.2%
 
ValueCountFrequency (%) 
5516931< 0.1%
 
4543431< 0.1%
 
3904781< 0.1%
 
3874001< 0.1%
 
3776611< 0.1%
 

type
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size17.9 KiB
conventional
9126 
organic
9123 
ValueCountFrequency (%) 
conventional912650.0%
 
organic912350.0%
 
2020-09-14T21:39:41.757991image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-09-14T21:39:41.873827image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:41.999292image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length12
Median length12
Mean length9.500410981
Min length7

year
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size18.0 KiB
2017
5722 
2016
5616 
2015
5615 
2018
1296 
ValueCountFrequency (%) 
2017572231.4%
 
2016561630.8%
 
2015561530.8%
 
201812967.1%
 
2020-09-14T21:39:42.160777image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-09-14T21:39:42.269185image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:42.417785image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length4
Median length4
Mean length4
Min length4

region
Categorical

HIGH CARDINALITY
UNIFORM

Distinct54
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size20.7 KiB
Nashville
 
338
Detroit
 
338
MiamiFtLauderdale
 
338
Louisville
 
338
LosAngeles
 
338
Other values (49)
16559 
ValueCountFrequency (%) 
Nashville3381.9%
 
Detroit3381.9%
 
MiamiFtLauderdale3381.9%
 
Louisville3381.9%
 
LosAngeles3381.9%
 
LasVegas3381.9%
 
Jacksonville3381.9%
 
Indianapolis3381.9%
 
Houston3381.9%
 
HartfordSpringfield3381.9%
 
Other values (44)1486981.5%
 
2020-09-14T21:39:42.660558image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-09-14T21:39:42.889447image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length19
Median length9
Mean length10.29535865
Min length4

month
Categorical

Distinct12
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size18.2 KiB
1
1944 
3
1836 
2
1728 
10
1512 
7
1512 
Other values (7)
9717 
ValueCountFrequency (%) 
1194410.7%
 
3183610.1%
 
217289.5%
 
1015128.3%
 
715128.3%
 
515128.3%
 
1114047.7%
 
814047.7%
 
414047.7%
 
1214037.7%
 
Other values (2)259014.2%
 
2020-09-14T21:39:43.117698image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-09-14T21:39:43.301974image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length2
Median length1
Mean length1.236670502
Min length1

day
Categorical

Distinct31
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size19.3 KiB
4
 
756
11
 
756
18
 
755
25
 
755
19
 
648
Other values (26)
14579 
ValueCountFrequency (%) 
47564.1%
 
117564.1%
 
187554.1%
 
257554.1%
 
196483.6%
 
86483.6%
 
106483.6%
 
126483.6%
 
156483.6%
 
36483.6%
 
Other values (21)1133962.1%
 
2020-09-14T21:39:43.505270image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-09-14T21:39:43.688021image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length2
Median length2
Mean length1.710066305
Min length1

Interactions

2020-09-14T21:39:16.142241image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:16.318456image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:16.487491image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:16.663874image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:16.842030image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:17.014223image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:17.197538image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:17.388882image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:17.583817image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:17.772406image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:17.955968image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:18.141693image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:18.326324image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:18.513470image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:18.690650image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:18.859038image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:19.041796image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:19.244936image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:19.438987image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:19.617990image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:19.805223image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:19.988406image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:20.172132image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:20.361716image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:20.535673image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:20.720900image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:20.908614image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:21.111521image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:21.310995image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:21.499867image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:21.679948image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:21.862475image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:22.028136image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:22.199137image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:22.370218image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:22.531837image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:22.709062image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:23.421678image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:23.602683image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:23.784120image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:23.955235image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:24.126918image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:24.310254image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:24.480469image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:24.652383image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:24.819799image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:24.987282image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:25.172652image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:25.357249image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:25.533731image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:25.710201image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:25.884017image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:26.074662image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:26.271325image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:26.448604image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:26.625952image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:26.806719image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:26.999993image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:27.193429image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:27.374169image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:27.552724image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:27.755840image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:27.952465image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:28.156340image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:28.348795image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:28.547467image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:28.756601image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:28.966878image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:29.182929image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:29.397656image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:29.608573image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:29.804485image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:29.996167image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:30.350176image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:30.536613image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:30.719309image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:30.905526image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:31.105796image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:31.314632image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:31.505557image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:31.695380image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:31.897523image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:32.075405image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:32.268707image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:32.450717image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:32.629498image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:32.806716image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:32.998707image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:33.203181image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:33.393340image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:33.571644image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:33.749295image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:33.923307image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:34.098161image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:34.277691image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:34.454619image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:34.625220image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:34.813861image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:34.996685image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:35.176017image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-09-14T21:39:43.846561image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-09-14T21:39:44.105721image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-09-14T21:39:44.640801image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-09-14T21:39:44.923111image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-09-14T21:39:45.204380image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-09-14T21:39:35.547341image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-14T21:39:36.076061image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

df_indexdateaveragepricetotal volume404642254770total bagssmall bagslarge bagsxlarge bagstypeyearregionmonthday
002015-12-271.33642361036544544886968603930conventional2015Albany1227
112015-12-201.3554876674446385895059408970conventional2015Albany1220
222015-12-130.93118220794109149130814580421030conventional2015Albany1213
332015-12-061.087899211327197672581156771330conventional2015Albany126
442015-11-291.28510399414383875618359861970conventional2015Albany1129
552015-11-221.265597911844806743668365561270conventional2015Albany1122
662015-11-150.998345313687367293831881961220conventional2015Albany1115
772015-11-080.9810942870310181580682962665620conventional2015Albany118
882015-11-011.02998111022873158511388111042830conventional2015Albany111
992015-10-251.077433884264757113862580615640conventional2015Albany1025

Last rows

df_indexdateaveragepricetotal volume404642254770total bagssmall bagslarge bagsxlarge bagstypeyearregionmonthday
1823922018-03-111.562212821623194816762165102520organic2018WestTexNewMexico311
1824032018-03-041.541739318321905013655134012530organic2018WestTexNewMexico34
1824142018-02-251.571842119742482013964136982660organic2018WestTexNewMexico225
1824252018-02-181.561759718921928013776135532230organic2018WestTexNewMexico218
1824362018-02-111.571598619241368012693124372560organic2018WestTexNewMexico211
1824472018-02-041.631707420461529013498130664310organic2018WestTexNewMexico24
1824582018-01-281.7113888119134310926489403240organic2018WestTexNewMexico128
1824692018-01-211.87137661191245272793949351420organic2018WestTexNewMexico121
18247102018-01-141.9316205152729817271096910919500organic2018WestTexNewMexico114
18248112018-01-071.6217489289423562241201411988260organic2018WestTexNewMexico17